Showee’s Winning Job List

[Quantified Self ver. 2022]


Hi there! My name is Sho’ and I am a Data Analyst.
Welcome to my analysis!


I am aspiring to land a job as a Data Analyst, prospecting to become a CDO down the line. I have:

  1. experience in day-to-day business analysis in various field (finance, education, legal), and
  2. hands-on skills with in-depth methodology from academics (MBA, MS)

that satisfy the data analyst job descriptions anywhere. However, the recent hiring flow that the major corporations deployed (from ATS resume filtering to fake interviews with “seniors”) has been such a struggle for me and no luck as of yet whatsoever. In full shameless disclosure, my job applications count more than 200 this year, without being offered an ideal full-time job. Yes, I am at where my friends are worried, frankly.

Good side is, by now I have enough data to quantify with regard to my job applications as I have been tracking the progress in a Google Sheet for analysis. I named it “Showee’s Winning Job List” (yeah, I said it) and the data currently has 300+ observations through this desperate experience. As a data analyst, I found thisspun-off as a serendipitous resource, and decided to construct alternative approaches and assume my next moves. See, data analysts are never bored as long as data are out there to grab (which, as you know, will never deplete).

This analysis is also to showcase my data visualization skills so I use various types of charts, which, a disclaimer, may seem a bit awkward or inconsistent as a whole. Still, I tried to keep it under control using blue as theme color and thus my picture above is in a blue shirt. I am also going to leave my codes here for my future self and for those who are interested in my thought process and choice of visualizations. All the codes are written in R language and the layout of this page was created with RMarkdown.

Ok, enough for introduction. Let’s get started.


Loading packages necessary for the entire project.

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("waffle")) install.packages("waffle")
if (!require("googlesheets4")) install.packages("googlesheets4")
if (!require("wordcloud")) install.packages("wordcloud")
if (!require("lubridate")) install.packages("lubridate")
if (!require("knitr")) install.packages("knitr")
if (!require("ggbeeswarm")) install.packages("ggbeeswarm")
if (!require("plotly")) install.packages("plotly")
if (!require("gganimate")) install.packages("gganimate")
if (!require("magick")) install.packages("magick")

library(tidyverse)
library(waffle)
library(googlesheets4)
library(wordcloud)
library(lubridate)
library(knitr)
library(ggbeeswarm)
library(plotly)
library(gganimate)
library(magick)


Importing Dataset

My data can be obtained per below. I used {googlesheets4} package to import the data from the target Google Sheet. Just because there are conditional formattings embedded to the very bottom of the sheet, it returns all 1000 rows by itself. Removing those practically absent rows.


Importing from Google Sheet

gs4_deauth()

job_list <- read_sheet("https://docs.google.com/spreadsheets/d/1ug6rRgsNRyvToBPFtaATSiPkmAgOPLVlrBx_tJGjih8/edit#gid=0")
job_list <- job_list %>%
  filter(Title != "NA")


Wordcloud from Company Names


The filtered observations job_list2022 will be used exclusively for 2022 applications.

Filtering for 2022

job_list2022 <- job_list %>%
  filter(year(Applied) == 2022)


Data Dictionary

Showee’s Winning Job List has 19 variables. It did not start out with 19, I added columns accordingly as I expand my strategy.

For example, quite a few people advised me to find a recruiter in the company I applied for on LinkedIn once I submit my application, and follow up by sending an introductory message via LinkedIn messaging. The search process is not easy but at some point, I followed the advice and added a column to see if it really works.


Variable Description
Title Defined title for the position
Position 9 categories, i.e. Data Analyst, Business Analyst, Financial Analyst, Research Analyst, Project Manager, Procurement, Analyst (other), Data (other), Other
Ad URL The URL for which I found the ad
Company Name Company name
Company Profile The link to the company’s profile on the search platform, where I evaluate credibility
Industry Industry by the search platform’s description
Company Website The company’s official website
Type 5 categories, i.e. Internship, Part-Time, Full-Time, Contract, Temporary
Paid Logical value, I initially considered unpaid jobs for experience
Found on The platform I found the ad on
action Initially set to enter brief last action for the ad, currently only used to check whether I applied or not
Resume Type Initially I had 3 types of resume prepared, i.e. design resume, simple 2-page resume, and CV-like long resume with ALL of my work experience. After all, I have only used the 2-page resume, which I update frequently
Cover Letter 3 options, i.e. not required, prepared (attached), and creating. I have only used the first two
Applied Applied date
Status 6 levels of application statuses, i.e. Awaiting Response, Pending, Rejected, In Progress, Ad Closed, Job Offered
last update Date last updated w.r.t. the application
Notes free comment, remarks
Follow-up 3 methods of communication to follow up with company recruiter, i.e. (LinkedIn) messaging, email, other
Last in Process 3 levels for how the application ended up at, i.e. Phone Screening, Hiring Manager, More


Overall how many applications have I submitted in 2022?


Counting Rows

kable(nrow(job_list2022), col.names = "Total Number of Job Applications in 2022", align = "cc")
Total Number of Job Applications in 2022
224

To be honest, I feel like I applied for a lot more, I guess it’s because the energy I spend for each application surpasses any other laborious work for me.

*for reference, as of March 25, 2023, my applications rate (by time) became more intense and for the 2023 alone, the number of applications are as follows.

job_list2023 <- job_list %>%
  filter(year(Applied) == 2023)

kable(nrow(job_list2023), col.names = "Total Number of Job Applications in 2023", align = "cc")
Total Number of Job Applications in 2023
86



Job Titles I Applied For

First, let’s see what job titles I applied for. As I said, my ideal title is Data Analyst but it does not necessarily mean it has to be in a data team or department, as long as the position allows me to access organizational data and analyze them to help the organization draw business solutions. Thus the position types are not just limited to “Data Analyst.”

I sorted the positions that I applied for into 8 categories, i.e.

  • Data Analyst,
  • Business Analyst,
  • Financial Analyst,
  • Research Analyst,
  • Project Manager,
  • Procurement,
  • Analyst (other),
  • Data (other), and
  • Other.

For reference, Procurement was my expertise before I started my academic journey, and I’d also contribute my knowledge if the opportunity focuses mainly on data analysis.

Here’s to visualize the positions I was motivated to apply.

Waffle Chart

job_waffle <- job_list %>%
  group_by(Position) %>%
  summarize(position_count = n()) %>%
  mutate(ratio = ceiling(position_count / sum(position_count) * 100)) %>%
  arrange(desc(position_count))

positions <- job_waffle$ratio
names(positions) <- job_waffle$Position



waffle(positions, rows = 5, colors = c("navy", "deepskyblue1", "deepskyblue4", "blue2", "cyan1", "darkblue", "cadetblue2", "deepskyblue3", "cadetblue" )) +
  labs(title = "Portion of Applied Positions")

Apparently I applied for Data Analyst jobs the most frequently. But it’s also true that as I delve into data analyst roles and careers, I grew my interest in business analyst roles where my business expertise from MBA and corporate experience could be greatly utilized and thus be advantageous. In this section, I would like to compare these two to see if application numbers have increased for Business Analyst roles.

Cumulated Number of Applications over Time

da_ba <- job_list2022 %>%
   group_by(Position, Applied) %>%
   filter(Position %in% c("Data Analyst", "Business Analyst")) %>%
   summarize(Applications = n()) 

da_ba_cum <- da_ba%>%
  mutate(cum_app = cumsum(Applications)) %>%
  arrange(Applied) 

cumline_daba <- ggplot(da_ba_cum, aes(x = Applied, y = cum_app, color = Position)) +
  geom_line(size = 2) +
  labs(title = "Cumulated Number of Applications", subtitle = "Data Analyst vs. Business Analyst", y = "Cumulated Number of Applications", x = "Applied Date (2022)") +
  scale_color_manual(values = c("Data Analyst" = "navy", "Business Analyst" = "deepskyblue1")) +
  theme_minimal() +
  theme(legend.position = "bottom")
ggplotly(cumline_daba) %>%
  layout(legend = list(orientation = "h", x = 0.25, y =-0.2))


Ok, from the beginning of this year to now, it seems it’s almost stable that the total number of Business Analyst applications has always been half of the Data Analyst applications. As you can see though, right before July, my Business Analyst applications almost reached the number of Data Analyst applications and that may be why I thought I have been attracted to Business Analyst role descriptions more. But the data shows the consistent lower number of applications for Business Analyst. Good example of human perception at one point does not always match the reality, isn’t it?

Joining Datasets and Adding Animation

full_days <- as.data.frame(seq(min(da_ba$Applied), max(da_ba$Applied), "days"))
colnames(full_days) <- "Date"

da <- da_ba %>%
   filter(Position == "Data Analyst") %>%
  group_by(Applied)

ba <- da_ba %>%
   filter(Position == "Business Analyst") %>%
  group_by(Applied)

da_full <- full_days %>%
  left_join(da, by = c("Date" = "Applied")) %>%
  replace_na(list(Position = "Data Analyst", Applications = 0)) %>%
  mutate(cum_app = cumsum(Applications)) %>%
  select(Date, Position, cum_app)
  
  
ba_full <- full_days %>%
  left_join(ba, by = c("Date" = "Applied")) %>%
  replace_na(list(Position = "Business Analyst", Applications = 0)) %>%
  mutate(cum_app = cumsum(Applications)) %>%
  select(Date, Position, cum_app)
  
daba_full <- ba_full %>%
  inner_join(da_full, by = "Date") %>%
  mutate(ba_cum_pct = cum_app.x / (cum_app.x + cum_app.y)*100, da_cum_pct = cum_app.y / (cum_app.x + cum_app.y)*100) %>%
  select(Date, Position.x, ba_cum_pct, Position.y, da_cum_pct)

daba_cum_full_long <- rbind(
    (select(daba_full, Date, Position = Position.x, cum_pct = ba_cum_pct))
    , (select(daba_full, Date, Position = Position.y, cum_pct = da_cum_pct))) %>%
  arrange(Date)

ggplot(daba_cum_full_long, aes(x = Date, y = cum_pct, color = Position, fill = Position)) +
  geom_bar(stat = "identity", position = position_fill(reverse = T), aes(color = Position)) +
  geom_hline(yintercept = .5) +
  scale_fill_manual(values = c("deepskyblue1", "navy")) +
  scale_color_manual(values = c("deepskyblue1", "navy")) +
  transition_time(Date) +
  shadow_wake(wake_length = 1, alpha = 0.8, wrap = FALSE) +
  theme(legend.position = "bottom") +
  labs(title = "Portion of Applications for Data Analyst vs. Business Analyst")


As you can see, Business Analyst application first appeared a little later than I started searching for a Data Analyst role (4/14/2022 to be exact). But since then it seems the cumulated rate seems pretty stable. The current portion for total applications of Business Analyst to Data Analyst starting from 3/3/2022 (when my first Data Analyst application was recorded) is

Finding the Ratio

bd_pp <- daba_cum_full_long %>%
  filter(Date == max(da_ba$Applied)) %>%
  select(Position, cum_pct) 

kable(bd_pp[1, 2]/bd_pp[2, 2], col.names = "Business Analyst / Data Analyst", align = "cc")
Business Analyst / Data Analyst
0.4786325

On the second thought, it really reminded me why I decided to study data analysis when I was in the MBA program. I wanted to find more persuasive ways to make my business analysis more applicable in the real world by using data in addition to my analysis from my business experience.


Overall, how many applications I sent on my designated search day?

Search Count Bins in Histogram

app <- job_list2022 %>%
  select(Position, Applied) %>%
  group_by(Applied) %>%
  mutate(Applications = as.numeric('Applied')) %>%
  summarize(Applications = n())

ggplot(app, aes(Applications)) +
  geom_histogram(binwidth = 1, fill = "cornflowerblue") +
  labs(x = "Number of Applications", y = "Days") +
  theme_minimal() +
  labs(title = "Number of Applications Submitted in a Day")


Job Search Platforms

Initially, I was randomly applying for whatever I see with “Data” and “Analyst” words in it, but at some point, I started receiving unsolicited mails and calls from Indian recruiters who are obviously not in this country. I was naive and sent my resume a couple of times, then it was circulated within their network. I found out they create the job descriptions to match my profile to obtain my resume with personal information. So I narrowed down the options to mere 12 platforms to send my resume through, and they are;

  • LinkedIn,
  • Indeed,
  • Glassdoor,
  • WayUp,
  • HandShake (university resource),
  • Company Site (on their own incl. ),
  • DataCamp,
  • DVS (professional community job board),
  • DAA (professional community job board),
  • Google/Coursera (for Google/Coursera Certificate holders),
  • CUNY (university resource),
  • NYC (city office employment),
  • Referral.


I’d first like to see which platforms are the ones I spent most of my time on.

pf_raw <- job_list2022 %>%
  group_by(pf = `Found on`) %>%
  summarize(pf_count = n()) %>%
  arrange(desc(pf_count))


  
ggplot(pf_raw, aes(x = reorder(pf, pf_count), y = pf_count, fill = pf)) +
  geom_segment(aes(xend = pf, yend = 0)) +
  geom_point(show.legend = F, size = 4, aes(color = pf))+
  geom_col(show.legend = F) +
  coord_flip() +
  scale_fill_manual(values = c("LinkedIn" = "#0077B5", "Indeed" = "#003A9B", "Referral" = "cornflowerblue", "DAA" = "cornflowerblue")) +
  scale_color_manual(values = c("LinkedIn" = "#0077B5", "Indeed" = "#003A9B", "Referral" = "cornflowerblue", "DAA" = "cornflowerblue")) +
  labs(title = "Platforms Used to Find the Applied Positions", subtitle = "", x = NULL, y = "Number of Applications Submitted", caption = "Application submission platforms may differ") +
  theme_minimal()


How Many Ghosts?

number of no updates more than 90 days

Filtering and Summarize

no_update <- c(which(is.na(job_list2022$'last update')))

no_res <- job_list2022 %>%
  filter(row_number() %in% no_update & Status == "Awaiting Response") %>%
  summarize(nr = n()) %>%
  mutate("% of No Responses" = round(nr/nrow(job_list2022)*100, 2)) %>%
  select("No Response" = nr, "% of No Responses")

ghosts <- job_list2022 %>%
  filter((Applied <= (Sys.Date() - 90)) & row_number() %in% no_update & Status == "Awaiting Response") %>%
  summarize(Ghosted = n()) %>%
  mutate("% of Ghosts" = round(Ghosted/nrow(job_list2022)*100, 2)) %>%
  select(Ghosted, "% of Ghosts")



kable(cbind(no_res, ghosts), col.names = c("Total No Responses", "% of No Responses", "Ghosted (no response more than 90 days)", "% of Ghosts"), align = "cc")
Total No Responses % of No Responses Ghosted (no response more than 90 days) % of Ghosts
110 49.11 110 49.11

*hindsight: all 2022 data is more than 90 days.


Pie Chart

ggplot(job_list2022, aes(x = "", fill = Status)) +
  geom_bar() +
  coord_polar(theta = "y", direction = 1) +
  theme_void() +
  ggtitle('Current Application Status') +
  scale_fill_brewer(palette = "Blues")

for which platforms applications were responded responses include rejection mails.

Join Datesets and Pivot into Longer Format

with_update <- c(which(!is.na(job_list2022$'last update')))
length(with_update)
## [1] 97
job_list2022$Platform <- as.factor(job_list2022$'Found on')

with_res <- job_list2022 %>%
  filter(row_number() %in% with_update) %>%
  group_by(pf = `Found on`) %>%
  summarize(response_count = n()) %>%
  arrange(desc(response_count))

platform_res <- pf_raw %>%
  left_join(with_res, by = "pf") %>%
  mutate("%" = response_count/pf_count*100, "no_res %" = 100 - response_count/pf_count*100)

res_long <- platform_res %>%
  filter(!is.na(response_count)) %>%
  pivot_longer(c("pf_count", "response_count"), names_to = "count_by", values_to = "count") %>%
  select(pf, count_by, count)

ggplot(res_long, aes(x = reorder(pf, count), y = count, fill = count_by)) +
    geom_col(position = position_dodge(width = -0.3)) +
  coord_flip() +
  scale_fill_manual(values = c(response_count = "navy", pf_count = "cornflowerblue")) +
  theme_minimal()

res_long_pct <- platform_res %>%
  filter(!is.na(response_count)) %>%
  pivot_longer(c("%", "no_res %"), names_to = "response", values_to = "pct") %>%
  select(pf, response, pct)

ggplot(res_long_pct, aes(x = "", y = pct, fill = reorder(response, pct))) +
  geom_bar(stat="identity", width=1) +
  coord_polar(theta = "y", direction = 1) +
  theme_void() +
  theme(legend.position = "bottom") +
  facet_wrap(~ pf, ncol = 5) +
  ggtitle('Response Rate by Platform') +
  scale_fill_manual(values = c("%" = "navy", "no_res %" = "azure2")) +
  labs(subtitle = "", legend = "", caption = "Responses include rejections")


How Long Do They Need?

I remember one of the applications returned an automated rejection as soon as a couple of hours. Meanwhile, I would receive a phone screening invitation within 24 hours or would hear from them for the first time in more than 3 months. What can I expect? What if my ideal job contacted me a week after I started another job? Seriously this is Showee’s Winning Job List and I need to know what’s going on.

The purpose of (...tbd)

Conclusion

In sum, even though the denominator is small, applications through referrals and professional community get the most response rate, and I might as well focus on those methods more.


How was my analysis? If you liked it or are interested in more of my projects, please check out my project page >>> https://shot.mba/projects.html

If you are interested in my career and hiring me, please contact through below. My resume can also be found here >>> https://shot.mba/Resume-S.Tachikawa.pdf on https://shot.mba/experience.html

Comments and feedback are welcome!




Appendix

[Bloopers]

Just for the record, I am listing the visualizations that were not deployed in my final product.


Unaesthetic, Data Not Suitable

These are just ugly, I found box plot should be used for bigger data.


Misleading

I wanted to emphasize the steep curve of applications for Business Analyst but the trendline showed otherwise. Applying trendline to this data does not really mean nothing actually, like on of the Anscombe’s Quartet because of my random choice of application. Withdrawn.


Misconception

This cute lollipop chart maybe suitable to display somewhat duration instead of volume. Howver, I experimentally used it behind the barchart to emphasize the growing direction.


No Impact as Expected

But just to be fair, I’d like see the ratio since I decided to apply for Business Analyst roles and set my start date here to be the first day I applied for the Business Analyst role, which is 4/14/2011.

Modifying the Range

da_full %>%
  filter(Date == as.Date("2022-04-14"))
##         Date     Position cum_app
## 1 2022-04-14 Data Analyst       2
kable(c(bd_pp[2, 2]/bd_pp[2, 2], bd_pp[2, 2]/bd_pp[1, 2]), col.names = "Business Analyst : Data Analyst", align = "cc")
Business Analyst : Data Analyst
1.000000
2.089286


Please check out my visualization collection page >>> https://sho-viz.com/